Jieqiong Zhao, Purdue University, zhao413@purdue.edu
Mahesh Babu Gorantla, Purdue University,
mgorantl@purdue.edu
Junghoon Chae, Purdue
University, jchae@purdue.edu
Benjamin Ahlbrand, Purdue University, bahlbran@purdue.edu
Hanye Xu, Purdue University, xu193@purdue.edu
Siqiao Chen, Purdue University, chen1722@purdue.edu
Guizhen Wang, Purdue University, wang1908@purdue.edu
Jiawei Zhang, Purdue University, zhan1486@purdue.edu
Abish Malik, Purdue
University, amalik@purdue.edu
Sungahn Ko,
Purdue University, ko@purdue.edu
David Ebert, Purdue
University, ebertd@purdue.edu
Student Team: No.
Did you use data from both mini-challenges? Yes.
Tableau, R, MS
Excel, our custom designed system
Approximately how many hours were spent
working on this submission in total?
100 hours
May we post your submission in the
Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? Yes
Video Download
Video:
http://pixel.ecn.purdue.edu:8080/~zhan1486/VASTCHALLENGE15/MC2.wmv
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC2.1 – Identify those IDs that stand out for their large volumes of communication. For each of these IDs
a. Characterize the communication patterns you see.
b. Based on these patterns, what do you hypothesize about these IDs?
Limit your response to no more than 4 images and 300 words.
The
data shows two IDs with an abnormally large amount of communication. We use our
system and sort the IDs in descending order based on the total number of messages.
This process is shown below:
Figure 1: Operation flow of analysis: (1): choose
specific time range; (2): view the user list ordered by communication volume;
(3): select a user and examine his/her detailed time-series pattern.
We find
that the two IDs that stand out for their large volumes of communication and
their patterns are as follows:
1.
ID 1278894
a. At
12:00 PM each day, this person sends out a large group message. The person then
sends another message to around the same amount of people every 5 minutes until
the next hour (1:00 PM), and then waits an hour to begin again. The person continues
the 5 minute interval messages for an hour followed by an hour break through
9:00 PM (the last message is sent at 8:55 PM). On Friday and Saturday, the
amount of messages that this person sends at 2:55 PM and 4:00 PM dips, likely
due to the second performance of the day by Scott Jones.
b. We
can hypothesize that this person is a park employee simply sending out
information to all the park participants. Since the amount of messages the person
sends seem to fluctuate with the amount of people and the events of the park (Scott’s
second performance in particular), we can theorize that this person sends
messages to park visitors that are not currently checked-in with information
about which attractions are open or have small lines and/or other general park
information.
2.
ID 839736
a. This
person does not have much of a pattern with its messages, but he/she sends
between 5 and 20 messages every minute of each day. The only exception to this
occurs at 12:00 on Sunday, when there is a huge spike of up to 1400 messages
that slowly decline back to his/her normal over the next 45 minutes.
b. We
hypothesize this person is also a park employee who deals with safety and
security issues due to the constant activity throughout the weekend. Also, the
spike in communication on Sunday would likely occur from the crime involving
Scott Jones, which this person would be responsible to mitigate for.
Figure 2: Communication patterns of top two users who had
high message volume (user 1278894 and user 839736).
MC2.2 – Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.
Limit your response to no more than 10 images and 1000 words.
The patterns we were able to find in the communications
data are described below:
1. There
is a spike in communications from Coaster Alley at 11:00 AM and 4:00 PM on
Friday and Saturday. Sunday only shows a spike at 11:00 AM. Scott’s show begins
at 10:00 AM and 3:00 PM, so the likely flurry of activity from Coaster Alley,
where the performance stage is located, indicates the end of Scott’s shows. The
“from” data spike is accompanied by a great increase in messages to the Coaster
Alley as well. This could be the result of people responding quickly to all the
messages sent from the location, but is more likely the result of people
communicating with their friends that just saw the show, or are trying to find
other friends after possibly being separated in the stage area or those who did
not attend the show.
Figure 3: Communication data over time
for Coast Alley for Friday (Left), Saturday (Middle), and Sunday (Right).
2. Small
groups generally communicate with other small groups. People who communicate
frequently are clustered as a cluster in our tool. By looking at the IDs in the
cluster, we find that many of them also travel together. In fact, there are
several clusters of 7 or 8 people that contain smaller groups of 2-3 people who
travel together, but are communicating with everyone in the cluster. This could
be from small groups breaking off from a bigger group while in the park or just
people who meet new friends and talk to them throughout the day.
Figure 4:
A cluster of people that travel separately, but still communicate amongst
themselves. (Left) Clustering individuals based on communication data on
Friday. (Middle) Clustering these individuals
based on their travel sequence patterns. (Right) List of IDs in the selected
cluster.
3. People
who travel together (based on our attraction attendance clustering method)
almost always communicate by messaging. Although it seems unusual that people
standing right next to each other would communicate with messaging, this is likely
due to these groups communicating with each other when they get separated, or
if they visit different attractions during the day. By studying these clusters,
we find that many people who travel together are also grouped together by their
frequent communicating. Also, the time series graphs of the total to and from
messages across the different locations look very similar. These results
confirm that people are sending messages to others travelling with them and
located in the same region of the park.
4. The
most messages sent to an external person come from the Wet Land during each of
the three days. Such a phenomenon makes the most sense for Sunday, since people
are likely to message people they know outside the park about the crime that
occurred as they are observing the happenings around the crime scene. Also,
people are likely to contact outsiders about the police investigation going on
that day. The reason for the popularity of sending external messages on Friday
and Saturday may be from people sending picture messages and/or descriptions of
Scott’s memorabilia to people they know who are not at the park.
Figure 5: Communication traffic by region to external
people on the three days of the event.
5. The
Wet Land is also the most popular area for people in the park to send and
receive messages. After removing the heavy afternoon flux of messages from and
to ID 1278894 at the Entry Corridor, Wet Land is the leader in messages sent
throughout the day. This is evident in the image below showing our filtering
out of the single ID for the total messages time graphs. Although there are
several rides in Wet Land, the cause for its heavy role in communications is
likely the result of the Beer Garden located there, and, of course, the
pavilion. Most people visit the pavilion at some point in the day and are
likely to communicate about the interesting memorabilia they are observing with
the other members of their travelling group and their messaging group
(communication cluster).
Figure 6: Communication traffic over
time on Sunday (Top), Saturday (Middle), and Friday (Bottom). We find that the Wet
Land is the most popular area for people to communicate.
6. Although
there is a spike in Coaster Alley communications at the end of Scott’s shows
(Pattern 1), there is a lull in communication in all areas of the park during
the shows. There are fewer people in the park and so it is less evident for
10:00-11:00, but it is quite clear by examining the time series graph for sent
messages for 3:00-4:00 that each day (except Sunday afternoon) at these times
the communications in the park drop in each region. It is likely that the drop
occurs because a vast number of people in the park move to the performance
stage to watch Scott’s show, and once the show starts, the people watching the
show cease their communication.
Figure 7: The number of (distinct)
people who send/receive messages over time on
the three days.
7. From
the above time series graph, we can infer that the entire park operations take
full swing at around 11am which is also the time at which athlete gives his
first speech of the day. Analyzing the peaks we found that at 12:00pm every
day, a person with ID: 1278894 who is assumed to be a Park Official sends
messages to almost every person in the park every five minutes for an hour and
then waits for an hour to begin again. This pattern continues until 9pm every
day.
8. The
messages sent by the park employees (IDs 1278894 and 839736) are apparently
responded to and people enter into conversations with these administrators. The
increase in messages sent from the Entry Corridor in the afternoon by ID
1278894 corresponds to an increase in messages to the Entry Corridor during
each hour the employee sends out messages. Also, by looking at individual ID
communication data, as shown below, we see people send messages back and forth
with ID 839736. The spike of messages from the Entry Corridor at 12:00 on
Sunday from ID 839736 is also paralleled by a spike of messages in other areas
in response. These park employees are clearly interacting with the visitors more than simply just distributing general
information.
Figure 8: Number of messages sent from the entry corridor
location (Left) and received from the different locations (Right) for the three
days.
MC2.3 – From this data, can you hypothesize when
the crime was discovered? Describe your
rationale.
Limit your response to no more than 3 images and 300 words.
The information about the total messages sent in the park show how
actively people are communicating at the observed times. To study this across
the three days in an attempt to find unusual patterns possibly related to the
crime, we used time series graphs for total messages from park visitors on each
day of the weekend. These graphs indicate similar patterns throughout each day,
except between 11:30 AM and 12:15 PM on Sunday. On this day, it appears that
there is a spike in communications in several different areas of the park. At
first, the spike occurs in Wet Land, which reaches an unusual number of
messages for that area.
Figure 9: Communication patterns on
Sunday for Wet Land.
When observing the map, we see the pavilion entrance is located in
Wet Land. Since the crime occurred at the pavilion, we can conclude the crime
was discovered at around 11:30 AM and the next half hour flurry of
communications from that region was the result of the discovery.
Also, with the use of the heat map feature in our tool, the
movement data shows there are no check-ins to the pavilion after 12:00 PM. At
12:00 PM, we see a large spike in communications from the Entry Corridor. Since
IDs 839736 and 1278894 (the only known park employees sending messages) send
all their messages from the Entry Corridor, we can determine they likely sent
messages notifying visitors the pavilion was closed at that time. Thus, since
the pavilion was closed for the police to conduct their investigation (as
stated in the news article), we further confirm that the crime occurred right
before 12:00 PM and was discovered at 11:30 AM, the beginning of the
communication spike.
Figure 10: No check-ins are observed in the pavilion after 12:00pm on Sunday. |